Search CORE

14 research outputs found

ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation

Author: Aimar Alessandro
Delbruck Tobi
Indiveri Giacomo
Milde Moritz B.
Neil Daniel
Publication venue
Publication date: 01/01/2017
Field of study

Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with high-precision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit weights and activations with only 0.8% drop in Top-1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

ZORA

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Author: Aimar Alessandro
Calabrese Enrico
Corradi Federico
Delbruck Tobi
Linares-Barranco Alejandro
Liu Shih-Chii
Lungu Iulia-Alexandra
Milde Moritz B.
Mostafa Hesham
Rios-Navarro Antonio
Tapiador-Morales Ricardo
Publication venue
Publication date: 01/01/2017
Field of study

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm

^2

. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

arXiv.org e-Print Archive

ZORA

Western Sydney ResearchDirect

idUS. Depósito de Investigación Universidad de Sevilla

Energy-Efficient Convolutional Neural Network Accelerators for Edge Intelligence

Author: Aimar Alessandro
Publication venue
Publication date: 06/08/2021
Field of study

Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many computer vision and natural language processing tasks, with applications ranging from automated personal assistants and social network filtering to self-driving cars and drug development. The growth in popularity of these algorithms has its root in the exponential increase of computing power available for their training consequent to the diffusion of GPUs. The achieved increase in accuracy created the demand for faster, more power-efficient hardware suited for deployment on edge devices. In this thesis, we propose a set of innovations and technologies belonging to one of the many research lines sparkled by such demand, focusing on energy-efficient hardware for convolutional neural networks. We first study how a standard 28 nm CMOS process performs in the context of deep learning accelerators design, giving special consideration to the power and area of circuits based on standard cells when reduced precision arithmetic and short SRAM memory words are used. The outcome of this analysis indicates how the power-efficiency gain following the reduction of the bit precision is non-linear and how it saturates when using a precision of 16 bits. We propose Nullhop, an accelerator pioneering the use of feature map sparsity, typical of convolutional neural networks, and quantization to boost the hardware capabilities. Nullhop’s novelty is its ability to skip all multiplications including a zero-valued activation. It reaches a power efficiency of 3 TOP/s/W with a throughput of almost 0.5 TOP/s in 6.3 mm2 . We present a neural network quantization algorithm based on a hardware-software co-design approach. We demonstrate its capabilities training several networks on various tasks such as classification, object detection, segmentation, and image generation. The quantization scheme is implemented in Elements, a convolutional neural network accelerator architecture that supports variable weight bit precision as well as sparsity. We demonstrate Elements capabilities with multiple design parameterizations, suited for a wide range of applications. One of these parameterizations, called Deuterium, reaches an energy efficiency of over 4 TOP/s/W using only 3.3 mm2 . We further explore the concept of sparsity with a third convolutional neural network accelerator architecture called TwoNullhop, able to skip over zeros of both feature maps and kernels. We tested the TwoNullhop architecture with Carbon, an accelerator that, despite having only 128 multiply-accumulate units and running at a frequency of only 500 MHz, achieves more than 2.4 TOP/s with an energy efficiency of 10.2 TOP/s/W in only 4 mm2 . The thesis ends with an overview of the challenges and possibilities we foresee in the future of deep learning hardware development, trying to predict what themes are going to dominate the field in the years to come

ZORA

A spike-based neuromorphic stereo architecture for active vision

Author: Aimar Alessandro
Donati Elisa
Indiveri Giacomo
Risi Nicoletta
Solinas Sergio
Publication venue: Frontiers
Publication date: 28/08/2019
Field of study

The problem of finding stereo correspondences in binocular vision is solved effortlessly in nature and yet is still a critical bottleneck for artificial machine vision systems. As temporal information is a crucial feature in this process, the advent of event-based vision sensors and dedicated event-based processors promises to offer an effective approach to solve stereo-matching. Indeed, event-based neuromorphic hardware provides an optimal substrate for biologically-inspired, fast, asynchronous computation, that can make explicit use of precise temporal coincidences. Here we present an event-based stereo-vision system that fully leverages the advantages of brain-inspired neuromorphic computing hardware by interfacing event-based vision sensors to an event-based mixed-signal analog/digital neuromorphic processor. We describe the multi-chip sensory-processing setup developed and demonstrate a proof of concept implementation of cooperative stereo-matching that can be used to build brain-inspired active vision systems

ZORA

Introduction to the disruptive technology in the teaching of environmental design

Author: Aimar Fabrizio
Auer Thomas
Melis Alessandro
Publication venue: Wolters Kluwer Italia
Publication date: 01/11/2017
Field of study

Portsmouth University Research Portal (Pure)

Disruptive Technologies. The integration of advanced technology in architecture teaching and radical projects for the future city.

Author: Aimar Fabrizio
Auer Thomas
Melis Alessandro
Publication venue: place:Milanofiori Assago, Milano
Publication date: 01/01/2017
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Introduction of disruptive technology in the teaching of environmental design.

Author: Aimar Fabrizio
Auer Thomas
Melis Alessandro
Publication venue: place:Milanofiori Assago, Milano
Publication date: 01/01/2017
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Siamese Networks for Few-Shot Learning on Edge Embedded Devices

Author: Aimar Alessandro
Delbruck Tobi
Hu Yuhuang
Liu Shih-Chii
Lungu Iulia Alexandra
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2020
Field of study

Edge artificial intelligence hardware targets mainly inference networks that have been pretrained on massive datasets. The field of few-shot learning looks for methods that allow a network to produce high accuracy even when only a few samples of each class are available. Siamese networks can be used to tackle few-shot learning problems and are unique because they do not require retraining on the new samples of the new classes. Therefore they are suitable for edge hardware accelerators which often do not include on-chip training capabilities. This work describes improvements to a baseline Siamese network and benchmarking of the improved network on edge platforms. The modifications to the baseline network included adding multi-resolution kernels, a hybrid training process as well a different embedding similarity computation method. This network shows an average accuracy improvement of up to 22% across 4 datasets in a 5-way, 1-shot classification task. Benchmarking results using three edge computing platforms (NVIDIA Jetson Nano, Coral Edge TPU and a custom convolutional neural network accelerator) show that a Siamese classifier can run on these devices at reasonable frame rates for real-time performance, between 3 frames per second (FPS) on Jetson Nano and 60 FPS on the Edge TPU. By increasing the weight sparsity during training, the inference time of a network with 25% weight sparsity increases by 10 FPS but with only 1% drop in accuracy

ZORA

CNN-based Object Detection on Low Precision Hardware: Racing Car Case Study

Author: Aimar Alessandro
De Rita Nicolo
Delbruck Tobi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/06/2019
Field of study

Crossref

ZORA

Energy Galleries: a sustainable opportunity for future cities.

Author: Aimar Fabrizio
Boarin Paola
Lara Hernandez Jose Antonio
Melis Alessandro
Publication venue: place:Milanofiori Assago, Milano
Publication date: 01/01/2017
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)